Skip to content

Wake up remote Claude Code agents on new events#9399

Merged
kjankov merged 19 commits intomasterfrom
katarina/quality-482-wake-up-remote-agent-on-new-events
May 2, 2026
Merged

Wake up remote Claude Code agents on new events#9399
kjankov merged 19 commits intomasterfrom
katarina/quality-482-wake-up-remote-agent-on-new-events

Conversation

@kjankov
Copy link
Copy Markdown
Contributor

@kjankov kjankov commented Apr 29, 2026

Description

Fixes orchestration v2 parent/child agent wake-up and messaging behavior for remote child agents.

This PR updates the client-side orchestration flow so that incoming parent-agent messages wake remote child agents through the server run follow-up path instead of trying to treat them like local dormant Claude harnesses. Previously, a remote child could receive the parent’s message, but it would not be restarted correctly in a harness and could fail or hang when trying to send a message back to the parent.

Main changes:

• Adds a remote-child wake path in the blocklist AI controller:
◦ detects remote child conversations with pending parent-agent message events
◦ submits a run follow-up to agent/runs/{run_id}/followups
◦ removes delivered pending message events after successful follow-up submission
◦ retries/logs failures instead of silently hanging
• Keeps local dormant Claude wake behavior separate from remote child wake behavior.
• Restores remote hidden child panes as cloud/ambient agent panes instead of local terminal-backed child panes.
• Ensures restored remote child panes enter the existing ambient session in AgentRunning state.
• Persists and restores remote-child conversation metadata so the client can distinguish local children from remote children across reloads.
• Improves orchestration v2 message sending:
◦ uses task-scoped server APIs when available
◦ adds bounded timeouts and error logging for send failures
◦ surfaces failures instead of leaving action execution indefinitely pending
• Adds regression coverage for:
◦ remote child conversation restoration
◦ remote child pane/session state
◦ task-scoped ambient agent messaging
◦ orchestration v2 message/error behavior

Testing

Server API dependencies

  • Is this change necessary to make the client compatible with a desired server API breaking change?
  • Does this change rely on a new server API?
    • If so, is the use of this API restricted to client channels that rely on the staging server (e.g. WarpDev)?
  • Is this change enabling the use of a server API on client channels that rely on the production server (e.g. WarpStable)?
    • If so, has the new server API been stable on production for at least one server release cycle? See here for more details.

Agent Mode

  • Warp Agent Mode - This PR was created via Warp's AI Agent Mode

Changelog Entries for Stable

CHANGELOG-NEW-FEATURE: {{text goes here...}}
CHANGELOG-IMPROVEMENT: {{text goes here...}}
CHANGELOG-BUG-FIX: {{text goes here...}}
CHANGELOG-BUG-FIX: {{more text goes here...}}
CHANGELOG-IMAGE: {{GCP-hosted URL goes here...}}
CHANGELOG-OZ: {{text goes here...}}

@cla-bot cla-bot Bot added the cla-signed label Apr 29, 2026
@kjankov kjankov marked this pull request as ready for review April 29, 2026 12:08
@oz-for-oss
Copy link
Copy Markdown
Contributor

oz-for-oss Bot commented Apr 29, 2026

@kjankov

I'm starting a first review of this pull request.

I completed the review and posted feedback on this pull request.

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

You can view the conversation on Warp.

I completed the review and posted feedback on this pull request.

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

Copy link
Copy Markdown
Contributor

@oz-for-oss oz-for-oss Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overview

This PR adds task-scoped orchestration messaging, remote-child wake-up via run followups, restored remote child pane handling, and persistence for remote-child metadata.

Concerns

  • The CLI now rejects --task-id with --conversation, which can break existing worker follow-up invocations that still pass both before the task-backed resume path runs.
  • V1 restored child lifecycle behavior appears to have been removed, and the poller restore path is no longer gated on OrchestrationV2, so disabled-v2 restores can regress.
  • The remote-child follow-up submission has no bounded timeout, so a stalled request can leave the wake permanently pending.
  • The local Claude wake path can mark the server task InProgress before verifying the conversation is still ready, leaving task state inconsistent if the final readiness check fails.
  • Security pass: no separate security-specific findings beyond the correctness issues above.

Verdict

Found: 0 critical, 5 important, 0 suggestions

Request changes

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

Comment thread crates/warp_cli/src/agent.rs Outdated
Comment thread app/src/ai/blocklist/orchestration_events.rs
Comment thread app/src/ai/blocklist/orchestration_event_poller.rs Outdated
Comment thread app/src/ai/blocklist/controller.rs Outdated
Comment thread app/src/ai/blocklist/controller.rs Outdated
Co-Authored-By: Oz <oz-agent@warp.dev>
@kjankov kjankov requested a review from cephalonaut April 29, 2026 12:41
kjankov and others added 3 commits April 29, 2026 09:11
Co-Authored-By: Oz <oz-agent@warp.dev>
Co-Authored-By: Oz <oz-agent@warp.dev>
…-new-events

Co-Authored-By: Oz <oz-agent@warp.dev>
Copy link
Copy Markdown
Contributor

@cephalonaut cephalonaut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the followup based remote child wake ups ought to be driven from the server side. One good reason I think is that ultimately we will support arbitrary subscription to event sources and having the wakeups driven from the event source side seems impractical. Let me know if that doesn't make sense!

Given that, I think the claude wakeup code also ought move. Most of it is harness specific — Claude transcript envelopes, session-index files, claude --resume command construction, parent-bridge state staging. You could move it to a wake_driver.rs alongside the parent bridge and expose a single wake_dormant_session method to the controller. Controller keeps the EventsReady subscription, readiness gating, ExecuteLocalHarnessCommand emission, and conversation-status update; harness owns eligibility, context fetch, on-disk staging, command construction, and reopen/rollback.

Comment thread crates/http_client/src/lib.rs
Comment thread app/src/terminal/view/ambient_agent/model.rs
Copy link
Copy Markdown
Contributor

@oz-for-oss oz-for-oss Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overview

This PR adds remote-child orchestration wake/messaging plumbing, task-scoped agent messaging APIs, persisted remote-child metadata, restored hidden child pane behavior, and a new Codex harness path.

Concerns

  • Remote child pending message events are drained locally without submitting the advertised run follow-up, which can drop the wake signal and leave the child agent dormant.
  • The new Codex auth seeding writes API keys into an existing auth file without correcting permissive permissions.
  • Codex harness conversion for server GraphQL values is brittle and can still map Codex runs to Unknown.

Verdict

Found: 1 critical, 2 important, 0 suggestions

Request changes

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

Comment thread app/src/ai/agent_sdk/driver/harness/codex.rs
Comment thread app/src/ai/blocklist/controller.rs Outdated
Comment thread app/src/server/server_api/ai.rs Outdated
Copy link
Copy Markdown
Contributor

@cephalonaut cephalonaut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second-pass review after the wake refactor. Most comments are small (dead code, fragile match, possible regression, asks for in-code documentation, two extraction nits). Two larger architectural notes embedded in the comments on maybe_prepare_local_claude_wake and prepare_local_wake_command: the wake decision belongs in a harness-side supervisor, and the streamer should not subscribe to dormant Claude-harness conversations at all — which collapses most of the wake's bookkeeping.

Comment thread app/src/ai/blocklist/controller.rs Outdated
Comment thread app/src/server/server_api/ai.rs
Comment thread app/src/server/server_api/ai.rs Outdated
Comment thread app/src/ai/blocklist/controller.rs
Comment thread app/src/ai/agent_sdk/ambient.rs Outdated
Comment thread app/src/ai/agent_sdk/driver/harness/claude_code/parent_bridge.rs
Comment thread app/src/ai/blocklist/controller.rs Outdated
Comment thread app/src/ai/blocklist/controller.rs Outdated
Comment thread app/src/ai/blocklist/controller.rs Outdated
Comment thread app/src/ai/agent_sdk/driver/harness/claude_code/wake_driver.rs
Co-Authored-By: Oz <oz-agent@warp.dev>
@kjankov kjankov requested a review from cephalonaut May 1, 2026 09:53
Copy link
Copy Markdown
Contributor

@cephalonaut cephalonaut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Third pass after the wake-listener fix. The fix to the wake-trigger gap looks good. Three small things below.

.await
}
None => ai_client.send_agent_message(request).await,
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

send_agent_message_with_timeout has no timeout on wasm

Not sure this is really invoked from wasm, but maybe add a comment.

if me.is_dormant_claude_wake_listener_eligible(conversation_id, ctx) {
me.start_dormant_claude_wake_listener(conversation_id, ctx);
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wake-listener restart has no backoff

On Ok(None) or Err the callback immediately restarts the wake-only listener if still eligible, with no attempt counter at this layer. The inner AgentEventDriverConfig::retry_forever backs off across SSE failures, but a clean-but-empty stream close would loop tight. Add a small backoff or a bounded retry count for the outer restart.

tx,
self_run_id,
hydrator,
hydrate_new_messages: true,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hydrate_new_messages — dead flag in production?

The struct field is only set false in the test sse_forwarding_consumer_skips_message_hydration_when_disabled. start_sse_connection (this line) hardcodes true. Either remove (YAGNI) or document the upcoming caller that will toggle it.

@kjankov kjankov merged commit 1148ae3 into master May 2, 2026
24 checks passed
@kjankov kjankov deleted the katarina/quality-482-wake-up-remote-agent-on-new-events branch May 2, 2026 19:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants